Distributions
Humboldt-Universität zu Berlin
Clam00 3amte Wed 000, 25, 2023
Last week we…
what do the vectors x and y contain?
The object x contains 1, 2, 3.
The object y contains 6.
Today we will learn…
ggplotIn your RProject folder…
moodle
notes called 02-datenviz1.R script
tidyverselanguageRggthemespatchworkdata frames are a collection of variables, where
data frames are just like spreadsheets, but are rectangular
different words for data frames:
tidyverse)when we talk about our data, we use certan words to refer to different parts:
a variable: a quantity, quality, or property you can measure
a value: the state of a variable when you measure it
an observation: set of measurements made under similar conditions
tabular data is a set of values, each associated with a variable and an observation
our first dataset contains data from a lexical decision task (LDT)
in the LDT, participants press a button to indicate whether a word is a real word or a pseudoword
lexdec datasetlanguageR is a companion package for the textbook Baayen (2008)
lexdeclexdec dataset contains data for a lexical decision task in English
lexdec variablesdf_lexdec: Lexical decision latencies elicited from 21 subjects for 79 English concrete nouns, with variables linked to subject or word.
| variable | description |
|---|---|
| Subject | a factor for the subjects |
| RT | a numeric vector for reaction times in milliseconds |
| Trial | a numeric vector for the rank of the trial in the experimental list. |
| Sex | a factor with levels F (female) and M (male). |
| NativeLanguage | a factor with levels English and Other, distinguishing between native and nonnative speakers of English |
lanaugeR package we’ve already loaded
Subject RT Trial Sex NativeLanguage Correct PrevType PrevCorrect
1 A1 6.340359 23 F English correct word correct
2 A1 6.308098 27 F English correct nonword correct
3 A1 6.349139 29 F English correct nonword correct
4 A1 6.186209 30 F English correct word correct
5 A1 6.025866 32 F English correct nonword correct
6 A1 6.180017 33 F English correct word correct
df_lexdec, which means “data frame lexical decision”Aufgabe 1: ?lexdec
Example 1
Find out what the other variables represent by running ?lexdec in the console.
count) of reaction times and native language of the participantsggplot2tidyverse is a collection of packages that facilitate data tidying and visualisation
tidyverse, this collection of packages is automatically loadedggplot2 package is a tidyverse package that builds plots in layersggplot2 SchichtenFigure 1: Example of layers in a ggplot figure
ggplot() is like an empty canvasggplot() how to visually represent our variables
+ to the end of our line of code, and on a new line of code use the function aes() to define our aetheticsRT) on the x-axis (the bottom of the plot)
RT in the exp() function to get RTs in milliseconds (for reasons we won’t discuss)Aufgabe 2: Mapping aesthetics
Example 2
ggplot() how to visualise themggplot2, geom functions start with geom_geom_bar()), line charts use line geoms (geom_line()), scatterplots use a point geom (geom_point()), etc.geom_histogram()Note
We got the following message when including geom_point():
stat_bin()usingbins = 30. Pick better value withbinwidth.
This is just telling us about the width of our bars: each bar represents a range of possible reaction time values + bins = 30 simply means there are 30 bars, we can change this have more or fewer bars by including e.g., bins = 20 or bins = 100 inside geom_histogram()
ggplot(
data = df_lexdec,
aes(x = exp(RT), fill = NativeLanguage)
) +
labs(title = "Stacked") +
geom_histogram() +
ggplot(
data = df_lexdec,
aes(x = exp(RT), fill = NativeLanguage)
) +
labs(title = "Layered: position = \"identity\"") +
geom_histogram(position = "identity") +
plot_layout(guides = "collect") & theme(legend.position = 'bottom') alpha = 0.3 to geom_histogram()Aufgabe 3: Histogram transparency
Example 3
Play around with the transparency of the histogram geom. Choose the alpha-value you prefer. The output should look something like this:
we can improve our axis and legend labels, and also add titles using the labs() function
let’s also use the function scale_fill_colorblind() from the ggthemes package
we’ll also use the theme_minimal() function from ggplot2; what does this do?
try to add the following to your plot
## histogram of reaction times by native language
ggplot(data = df_lexdec) +
aes(x = exp(RT), fill = NativeLanguage) + # set aesthetics
labs(title = "Reaction times by L1",
x = "Reaction times (ms)") +
geom_histogram(position = "identity", alpha = 0.3) +
scale_fill_colorblind() + # make fill colorblind friendly
theme_minimal() # set plot themefig1 or xyz)fig_lexdec_rt, for “figure lexical decision task reaction times”Aufgabe 4: ggplot2 review
Example 4
fig_lexdec_rtfig_lexdec_rt)NativeLanguagegeom_histogram() with geom_bar()
fig_lexdec_l1)patchwork package+ to connect two plots side-by-side/ to present them one on top of the other+/These exercises should be also be included in your script if you upload it to Moodle. Working through the class materials will prepare you for these tasks.
Reproduce our histogram as a density plot by replacing geom_histogram() with geom_density()
Produce a barplot that shows the number of observations per word class (hint: you’ll need the variable Class from our dataset).
Print your density plot and class barplot one on top of the other using the patchwork package
position = "dodge" argument):Today we learned…
ggplotHergestellt mit R version 4.4.0 (2024-04-24) (Puppy Cup) und RStudioversion 2023.3.0.386 (Cherry Blossom).
R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Ventura 13.2.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] magick_2.8.3 kableExtra_1.4.0 knitr_1.46 patchwork_1.2.0
[5] ggthemes_5.1.0 languageR_1.5.0 lubridate_1.9.3 forcats_1.0.0
[9] stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2 readr_2.1.5
[13] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.1 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] utf8_1.2.4 generics_0.1.3 renv_1.0.7 xml2_1.3.6
[5] stringi_1.8.3 hms_1.1.3 digest_0.6.35 magrittr_2.0.3
[9] evaluate_0.23 grid_4.4.0 timechange_0.3.0 fastmap_1.1.1
[13] rprojroot_2.0.4 jsonlite_1.8.8 fansi_1.0.6 viridisLite_0.4.2
[17] scales_1.3.0 cli_3.6.2 rlang_1.1.3 munsell_0.5.1
[21] withr_3.0.0 yaml_2.3.8 tools_4.4.0 tzdb_0.4.0
[25] colorspace_2.1-0 here_1.0.1 pacman_0.5.1 vctrs_0.6.5
[29] R6_2.5.1 lifecycle_1.0.4 pkgconfig_2.0.3 pillar_1.9.0
[33] gtable_0.3.5 Rcpp_1.0.12 glue_1.7.0 systemfonts_1.0.6
[37] highr_0.10 xfun_0.43 tidyselect_1.2.1 rstudioapi_0.16.0
[41] farver_2.1.1 htmltools_0.5.8.1 labeling_0.4.3 rmarkdown_2.26
[45] svglite_2.1.3 compiler_4.4.0
Woche 2 - Datenvisualisierung 1